Picture for Hengshuang Zhao

Hengshuang Zhao

Any3D-VLA: Enhancing VLA Robustness via Diverse Point Clouds

Add code
Jan 31, 2026
Viaarxiv icon

CoDance: An Unbind-Rebind Paradigm for Robust Multi-Subject Animation

Add code
Jan 16, 2026
Viaarxiv icon

GDRO: Group-level Reward Post-training Suitable for Diffusion Models

Add code
Jan 05, 2026
Viaarxiv icon

Alchemist: Unlocking Efficiency in Text-to-Image Model Training via Meta-Gradient Data Selection

Add code
Dec 18, 2025
Viaarxiv icon

In Pursuit of Pixel Supervision for Visual Pre-training

Add code
Dec 17, 2025
Viaarxiv icon

MemFlow: Flowing Adaptive Memory for Consistent and Efficient Long Video Narratives

Add code
Dec 16, 2025
Viaarxiv icon

DrivePI: Spatial-aware 4D MLLM for Unified Autonomous Driving Understanding, Perception, Prediction and Planning

Add code
Dec 14, 2025
Figure 1 for DrivePI: Spatial-aware 4D MLLM for Unified Autonomous Driving Understanding, Perception, Prediction and Planning
Figure 2 for DrivePI: Spatial-aware 4D MLLM for Unified Autonomous Driving Understanding, Perception, Prediction and Planning
Figure 3 for DrivePI: Spatial-aware 4D MLLM for Unified Autonomous Driving Understanding, Perception, Prediction and Planning
Figure 4 for DrivePI: Spatial-aware 4D MLLM for Unified Autonomous Driving Understanding, Perception, Prediction and Planning
Viaarxiv icon

GenieDrive: Towards Physics-Aware Driving World Model with 4D Occupancy Guided Video Generation

Add code
Dec 14, 2025
Viaarxiv icon

Wan-Move: Motion-controllable Video Generation via Latent Trajectory Guidance

Add code
Dec 09, 2025
Viaarxiv icon

Seg-VAR: Image Segmentation with Visual Autoregressive Modeling

Add code
Nov 16, 2025
Viaarxiv icon